Effective Constituent Projection across Languages
نویسندگان
چکیده
We describe an effective constituent projection strategy, where constituent projection is performed on the basis of dependency projection. Especially, a novel measurement is proposed to evaluate the candidate projected constituents for a target language sentence, and a PCFG-style parsing procedure is then used to search for the most probable projected constituent tree. Experiments show that, the parser trained on the projected treebank can significantly boost a state-of-the-art supervised parser. When integrated into a tree-based machine translation system, the projected parser leads to translation performance comparable with using a supervised parser trained on thousands of annotated trees.
منابع مشابه
Relaxed Cross-lingual Projection of Constituent Syntax
We propose a relaxed correspondence assumption for cross-lingual projection of constituent syntax, which allows a supposed constituent of the target sentence to correspond to an unrestricted treelet in the source parse. Such a relaxed assumption fundamentally tolerates the syntactic non-isomorphism between languages, and enables us to learn the target-language-specific syntactic idiosyncrasy ra...
متن کاملBracketing Input for Accurate Parsing
Syntax parsers can benefit from speakers' intuition about constituent structures indicated in the input string in the form of parentheses. Focusing on languages like Korean, whose orthographic convention requires more than one word to be written without spaces, we describe an algorithm for passing the bracketing information across the tagger to the probabilistic CFG parser, together with one fo...
متن کاملDeterministic Fuzzy Automaton on Subclasses of Fuzzy Regular ω-Languages
In formal language theory, we are mainly interested in the natural language computational aspects of ω-languages. Therefore in this respect it is convenient to consider fuzzy ω-languages. In this paper, we introduce two subclasses of fuzzy regular ω-languages called fuzzy n-local ω-languages and Buchi fuzzy n-local ω-languages, and give some closure properties for those subclasses. We define a ...
متن کاملBalancing Effort and Information Transmission During Language Acquisition: Evidence From Word Order and Case Marking
Across languages of the world, some grammatical patterns have been argued to be more common than expected by chance. These are sometimes referred to as (statistical) language universals. One such universal is the correlation between constituent order freedom and the presence of a case system in a language. Here, we explore whether this correlation can be explained by a bias to balance productio...
متن کامل